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ABSTRACT 

Progressive band selection (PBS) reduces spectral redundancy without significant loss of information, thereby reducing 
hyperspectral image data volume and processing time. Used onboard a spacecraft, it can also reduce image downlink 
time. PBS prioritizes an image's spectral bands according to priority scores that measure their significance to a specific 
application. Then it uses one of three methods to select an appropriate number of the most useful bands. Key challenges 
for PBS include selecting an appropriate criterion to generate band priority scores, and determining how many bands 
should be retained in the reduced image. The image's Virtual Dimensionality (VD), once computed, is a reasonable 
estimate of the latter. We describe the major design details of PBS and test PBS in a land classification experiment. 
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1. INTRODUCTION 

Hyperspectral images contain iar more information than is necessary for most applications. The Hughes phenomenon 
[1], in foct, shows that trying to make use of all the data in an image yields less accurate results than if some information 
were intentionally thrown away. Discarding information therefore has two benefits: it reduces processing time and 
improves the quality of the application's output. Additionally, some hyperspectral imagers are installed on spacecraft 
with slow downlink transmitters, so removing information reduces the time needed to send the image to the ground. 

Naturally, one must carefiilly choose which information to throw out. Often the information needed to distinguish two 
types of vegetation (in land classification scenarios) or detect a camouflaged tank (in target detection applications) lies in 
a few criticalbands. Once these hands are identified, however, often the rest can he discarded safely. 

We developed Progressive hand selection (PBS) to detect the most critical image hands for a target application and build 
reduced images with only as many hands as are needed. The first step requires a criterion on which to judge the image 
hands and assign a numerical score. While PBS can use any real-valued function to generate these priority scores, most 
of the criteria studied here calculate a statistic (such as variance) for all pixels in the hand. 

After assigning scores to each hand, PBS builds a reduced image containing only the highest-scoring hands. The user can 
either directly specify the number of hands to keep, or specify an acceptable level of application performance. In the 
latter case, PBS can use one of three methods to search for the proper number of bands to retain. 

This section described PBS and its benefits for hyperspectral image processing applications. Section 2 describes the 
criteria used to identify the most valuable bands in an image, and Section 3 describes how PBS finds the correct number 
of hands to include in its output. Section 4 describes an experiment we performed to test our hand selection and output 
image sizing methodologies in a land classification application. Concluding remarks are found in Section 5. 


2. BAND SELECTION CRITERIA 

We use eight criteria in this experiment to rank image bands. Three compute central moments about the data, two more 
calculate more complex statistical measures, and three are used as experimental controls. 

2.1. Central moments 

These criteria treat the image band as a set of samples x t from a random variable X, then compute its central moments. 
Although a data set with high variance is not automatically more useful than one with low variance, greater variance 
typically implies greater information content. Likewise, larger measures of other central moments generally indicate the 
image has features useful to the target application. 



2.1.1. Variance (second central moment) 
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where Xi is the value of pixel/, x is the mean value of all pixels in the band, and Vis the number ofpixe Is. 

2.1.2. Skewness (thirdcentral moment) 
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where er is the standard deviation of all pixels in the band. 

2.13. Kurtosis (fourth central moment) 
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2.2. Infinite-order statistics 
2.2.1. Entropy 

Entropy is the classic measure of information content in a data set, defined as H(x): 
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where p(x) is the probability ofpixel x within the image X (calculated by constructing a histogramof image pixels) and 
N is the number of pixels in the image. This equation calculates H(x) in b its per pixe 1. 


2.2.2. Information Divergence 

Information Divergence [2] is an alternative measure of information content. It calculates the number of additional bits 
needed to represent a dataset with one distribution using a code designed for another distribution. For ourpurpose, the 
former (denoted p below) is the image band, and the latter (g) is a dataset with a Gaussian distribution with the same 
mean and variance as the image. The more the distribution of data in the image deviates from the Gaussian model, the 
higher its information divergence (D), and the more information it is presumed to contain. 
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2.3. Control criteria 

The remaining criteria can be thought of as naive alternatives to computing statistical measures for each band. They are 
intended as control variables for the coming experiment. 



2.3.1. Random 

The random classifier assigns a randomscore between 0 and 1 to each image band. 


2.3.2. Uniform 

The uniformclassifier scores bands in a way that ensures the selected bands are distributed evenly across the original 
frequency range. For example, if we intend to select 100 of 200 bands from an image, the uniform classifier will assign a 
high score (between 2 and 3) to odd-numbered bands and a low score (between 0 and 1) to even-numbered bands. The 
image output by PBS will therefore contain every other band across the original range. 

2.3.3. Sensor-based 

The sensor-based classifier uses basic information about the image sensor to improve the output of the random classifier. 
This classifier assigns a zero score to bands known to be uncalibrated, masked, or of otherwise low quality, then gives 
random scores to the remaining bands. This optimization adds very little complexity to the random selection algorithm 


3. PROGRESSIVE BAND SELECTION 

Progressive band selection is designed to tailor the number of bands in the reduced image for the target application. If the 
user does not directly specify the number of hands, PBS will search for the right number of hands to retain to meet an 
application-specific performance target. Figure 1 illustrates the three search strategies. 

3.1. Progressive band expansion 

Starting with an empty output image, Progressive band expansion (PBE) adds the highest-ranking image band to the 
output image and measures its performance in the target application. If higher performance is needed, PBE adds the next 
highest-ranking hand and measures the performance again. This process continues until the application meets its 
performance target using the output image. 

3.2. Progressive band reduction 

The logical inverse of PBE, Progressive band reduction (PBR) starts with a copy of the input image as its output. It then 
removes the lowest-ranking hand and re-measures application performance. The process continues until enough bands 
have been removed to reduce application performance to match (or fall just below) the target level. 

3.3. Binary bisection band selection 

PBR and PBE search iteratively for the right number of bands to retain. By contrast. Binary bisection band selection 
(BBBS) performs a binary search. BBBS begins with an output image containing the highest-ranking 50% of bands from 
the input image (the midpoint between an empty image and the full dataset). If application performance is above the 
threshold, the lowest-ranking half of these bands are removed (leaving the highest ranking 25%). However, if 
performance was inadequate, half of the bands that were excluded from the first dataset are added, such that the output 
image consists of the highest-ranking 75% of input bands. Performance is measured again, and if necessary the number 
of bands is adjusted by a factor of 1/8 (12.5%). The process continues until performance is within an acceptable range. 




Figure 1. PBE and PBR search iteratively for the correct number of bands, while BBBS employs a binary search. 


4. EXPERIMENT 


4.1. About the test image 

The test image is a small subset (shown in Figure 2) of an image taken by the Hyperion instrument onboard the NASA 
EO-1 spacecraft. The image depicts a suburban and mountainous forest area near Tuscon, Arizona on June 17, 2003 at 
10-10:1 5am local time. [4] The original image is 5585 pixels long, however for this experiment we use a 512 pixel 
subset ofthe image, centered over the suburban area. Both images have 256 spatial columns, and each pixel covers a 
30m x 30m area. There are 242 spectral hands in the test image, covering wavelengths between 400 and 2500 nm. 

Atmospheric correction ofthe image is not necessary since all the training data needed for both classification algorithms 
are taken fromthe image itself. The same atmospheric effects affect all training and test pixels. 



Figure 2. The test image is a small subset of a scene generated by EO-1 Hyperion. 




Figure 3 shows the majo r features in the dataset. A suburb covers the center of the image; a grid of paved streets 
encircles residential areas and has shopping malls and parking lots at its intersections. The residential areas consist of 
grassy lawns along narrow side streets. The largest expanse of paved ground is the mnway at the airport, south of the 
suburbs, and the largest expanses of grass are found at the golf course just to its north. A forest fire rages in the 
mountains to the north, and while the fire itself is not included in the test image, the upper-right comer is shrouded by 
smoke fromthe blaze. The only detectable pools ofwater in this parched suburb are a lake and a small reservoir. 
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Figure 3. Left: Major features in the test image. Right: map of materials present in the 
Triangles indicate where spectral samples of each material were taken. 
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test image. 


4.2. Experiment 

Typically, PBS uses one of the three methods described above to determine an acceptable number of bands to include in 
the output image. Flowever, one aim of this experiment is to test Virtual Dimensionality (VD) as an initial estimate of the 
output hand count. VD is a measure of the number of spectrally distinct signals that combine to produce the spectra in 
the output image. [3] PBS will prepare output images with band counts equal to several multiples of VD: [0.25, 0.5, 1, 
1.5, 2, 2.5]. Since the Virtual Dimensionality of the test image is 35, we configured PBS to prepare output with 9, 18, 35, 
53, 70 and 88 bands images using each of the eight criteria. The reduced images PBS produces will serve as input to two 
land classification algorithms. 

PBS requires a certain amount ofprocessing time. Ideally, the time saved by processing the reduced image in place of 
the full-size image will more than offset the cost of using PBS. Figure 4 shows that the time needed to process the test 
image with PBS varies with the criterion used. The skewness and kurtosis criteria are the most computationally 
expensive, followed closely by information divergence. All three use expensive mathematical operations to generate 
priority scores, extensive processing that may not be worth the effort. 





PBS Processing Time, by criterion 
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Figure4. Computationally comp lex criteria exact asteep performance penalty. 

4.2.1. Spectral angle mapping (SAM) 

Spectral angle mapping (SAM) computes the similarity oftwo image pixels by treating them as vectors in a high- 
dimensional space and measuring the angle between them This technique is uselul for satellite images because it ignores 
differences in the intensity of sunlight in each pixel, focusing instead on the differences in the at-sensor spectral radiance 
of the two materials. SAM makes material classification maps by comparing each pixel to a set of representative material 
spectra and coloring the map according to which material spectra makes the smallest angle with each pixel. 

The four dominant materials present in the test image are grass, pavement, soil and water. The triangles in Figure 3 show 
where pixels representing these materials were selected ffomthe original image. The materials in these pixels were 
verified using Google Earth imagery of the same area. F igure 3 also shows a material map created using the full test 
image. In this experiment, output images from PBS are processed by SAM in order to attempt to generate the same map. 

Figure 7 shows the maps generated by SAM from each of the reduced images. The information divergence, kurtosis and 
skewness criteria failed to make reduced images with fewerthan 53 bands that SAM could process. This is because the 
criteria selected mostly dark and uncalibrated bands, due to the fact that these bands have data that are heavily skewed 
ffomthe expected Gaussian distribution. These criteria consider themvaluable, when in fact they are just noisy. Larger 
datasets produced by these criteria (which are guaranteed to contain at least some calibrated bands) still produced poor 
material maps: kurtosis caused mis classification ofthe smoky area in the upper-right comer, and skewness caused most 
of the central area to he classified as pavement. 

Other criteria fared better: with as few as 1 8 bands (one half of the image's Virtual Dimensionality), entropy, variance 
and the three control criteria (random, uniform and sensor-based) produced images that yielded nearly correct SAM 
output. All five appeared to perform almost equally, indicating that 18 bands (1/2 x VD) is enough to perform the 
classification and still allow for slight variance in the set ofbands chosen. Nine bands (1/4 x VD), however, appears to 
he a bridge too far: only the uniform classifier produced reliable output. 

Smaller images can be processed faster than larger ones, but Figure 5 shows that processing an 18-band version of the 
test image saves only three seconds over processing an 88-band version. Since all five ofthe non-control criteria take 
longer than three seconds to execute, PBS pre-processing is uneconomical at best. 




Application Run Time: Spectral Angle Mapper (SAM) 
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Figure 5. Smaller input images require less processing time, meanwhile 
some PBS criteria take longer to evaluate the image than others. 
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4.2.2. Support vector machine (SVM) 

We used libsvm 2.89 [5] to create a support vector machine (SVM)-based classifier. The classifieruses C-support vector 
classification (C-SVC), the radial basis function (RBF) kernel, and the one-against-one approach to multi-class 
classification. Using a gamma value of 2 gives the most accurate output in this configuration. 

We chose forty-eight pixels lfomthe test image to serve as the training set, twelve for each material. The pixels were 
gathered from the regions marked with triangles in Figure 3. Training the SVM took an insignificant amount of time . 

SVM output maps, shown in Figure 8, look different than their SAM counterparts. SAM tends to find a "default" 
material choice; all pixels tend to he classified as this material (sand, in the case of the test image) unless their spectra are 
strikingly similar to one of the other materia Is. In SVM output maps, more pixels are classified as pavement or grass, 
even when they contain a substantial proportion of sand. 

As noted previously, the information divergence, skewness and kurtosis criteria failed to provide reduced images with 
fewer than 53 hands that could he processed by SVM. Even when allowed to select 70 or 88 hands, these three criteria 
produced images that made it easy for SVM to misclassify pixels in areas obscured by smoke. 

Results for other criteria also mirrored the results obtained using SAM. Entropy, variance and the control criteria 
produced images with as few as 1 8 bands (1/2 * VD) that yielded output maps that were virtually identical to those made 
with reduced images with 88 hands orthe full image. Given only nine bands, SVM appears to outperform SAM: maps 
produced with the variance and sensor-based criteria are remarkably similarto those made with the full-size image. 
However, the performance of the other criteria given so few bands indicates that this output is unreliab le. 

Figure 6 shows that SVM is a more efficient algorithm than SAM, making similar maps in roughly half the time. 
However, as with SAM the time saved is far less than the time PBS spends to process the images. 






Application Run Time: Support Vector Machine (SVM) 



Figure 6. SVM is more efficient than SAM, but still benefits from smaller input images. 
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5. CONCLUSION 

PBS attempts to reduce processing time by removing spectral redundancy from hyperspectral images. Our experiment 
confirms that in land classification scenarios smaller datasets are processed in less time than the lull image yet produce 
similar output. However, the savings does not offset the overhead of running PBS with any of the five non-control 
criteria. It is worth pointing out, though, that the control criteria, meant to represent na'ive methods ofhand selection, 
produce reduced images with equal or greater quality than the non-control criteria, and ran quickly enough to make PBS 
an economical way of improving processing efficiency. Future research will search for hand selection criteria that can 
outperformna'ive approaches. 
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Figure 8. Output of support vector machine (SVM) classifier given reduced images, 
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